Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

نویسندگان

  • Lijuan Zhou
  • Hui Wang
  • Wenbo Wang
چکیده

As an important task of data mining, Classification has been received considerable attention in many applications, such as information retrieval, web searching, etc. The enlarging volumes of information emerging by the progress of technology and the growing individual needs of data mining, makes classifying of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. This paper introduces the classification algorithms and cloud computing briefly, based on it analyses the bad points of the present parallel classification algorithms, then addresses a new model of parallel classifying algorithms. And it mainly introduces a parallel Naïve Bayes classification algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm improves the original algorithm performance, and it can process large datasets efficiently on commodity hardware.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

Assessment Methodology for Anomaly-Based Intrusion Detection in Cloud Computing

Cloud computing has become an attractive target for attackers as the mainstream technologies in the cloud, such as the virtualization and multitenancy, permit multiple users to utilize the same physical resource, thereby posing the so-called problem of internal facing security. Moreover, the traditional network-based intrusion detection systems (IDSs) are ineffective to be deployed in the cloud...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

GASA: Presentation of an Initiative Method Based on Genetic Algorithm for Task Scheduling in the Cloud Environment

The need for calculating actions has been emerged everywhere and in any time, by advancing of information technology. Cloud computing is the latest response to such needs. Prominent popularity has recently been created for Cloud computing systems. Increasing cloud efficiency is an important subject of consideration. Heterogeneity and diversity among different resources and requests of users in ...

متن کامل

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012